Eagle is a vision-centric high-resolution multimodal large language model family that enhances the perception ability of multimodal large language models by fusing multiple visual encoders and different input resolutions.
Multimodal Fusion
Transformers